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Abstract 

In this article we demonstrate that the so-called bursting production of molecular species during 
gene expression may be an artifact caused by low time resolution in experimental data collection 
and not an actual burst in production. We reach this conclusion through an analysis of a two-stage 
and binary model for gene expression, and demonstrate that in the limit when mRNA degradation 
is much faster than protein degradation they are equivalent. The negative binomial distribution 
is shown to be a limiting case of the binary model for fast "on to off" state transitions and high 
values of the ratio between protein synthesis and degradation rates. The gene products population 
increases by unity but multiple times in a time interval orders of magnitude smaller than protein 
half-life or the precision of the experimental apparatus employed in its detection. This rare-and-fast 
one-by-one protein synthesis has been interpreted as bursting. 
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Understanding the origin of fluctuations at the single cell level and how organisms deal 
them to guarantee both developmental viability and evolutionary adaptation to a constantly 
changing environment conditions is a challenge of the post-genomic era l|,|2|. Often, stochas- 
ticity at the single cell level is due to the presence of biochemical reactants in low copy 



number inside the cell [3| and heterogeneous spatial distribution J4]. Experimental tech- 
niques to investigate these phenomena have been greatly enhanced by the use of fluorescent 
molecules and technology to track the spatial and temporal behavior of individual molecules 
5|, |6|. Despite the striking nature of the data these techniques provide, these advances do 
not necessarily give the full picture of the dynamics of events at the single cell level such as 
transcription and translation. One example is the measurement of the bursting production 
of molecules, defined as an incremental increase in mRNA or protein number greater than 
one at a given time. Bursting is often held to be the usual mechanism for the synthesis of 
gene products . As we show here, the inference of bursting molecular production may 
be an artifact due to a lack of sufficiently fine temporal resolution in experimental data. 
Also, from a modeling perspective the inference of bursting may be flawed due to a reliance 
on the shape of stationary probability distributions rather than an analysis of the underlying 
dynamical processes giving rise to them. 

The experimental observation of these jumps in molecular numbers has motivated sev- 
eral models for the prediction and fitting of observed data, e.g. by employing a Langevin 
approach (continuous) or the master equation (discrete). In the continuous case, stationary 
gamma distributions for molecular concentrations arepredicted along with discontinuous 
trajectories for the corresponding stochastic process S|. In the discrete framework, bursts 
appear in models for gene expression with two stochastic variables (so called two-stage mod- 
els with mRNA and protein), in the limit where the mRNA degradation rate is much larger 
that the degradation rate for protein (a common experimental finding). In these cases, the 
model predicted probability distributions are well described by a negative binomial proba- 



bility distribution {9] and simulations exhibit temporal bursting in protein numbers [6]. 

In this article we use a discrete modeling framework to show that the bursting limit ac- 
tually corresponds to a particular regime of a model of a switching gene between "on" and 
"off" states with a one step variation in the stochastic variable corresponding to protein 
numbers 1CH12| . The model we develop here is an approximation to a model which is inte- 



grable in both the stationary 
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13 1 and time-dependent regimes Q, Q with symmetries 



underlying the existence of analytical solution, and whose biological implications have been 



explored elsewhere 
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The steady state solution for the binary model, in the limit of rapid transitions from the 
"on" to the "off" state, approaches a negative binomial distribution that also describes the 
bursting model. Simulation of the binary model shows apparent bursting, but examined on 
a finer time scale reveals the protein numbers actually increase in a unitary fashion. This 
clearly suggests that the experimental detection of bursting in gene product numbers may 
be due to lack of temporal resolution in the data. Our results give clear guidelines for the 
conditions on the transcription and translation processes for artifactual bursts to appear. As 
such, the modeling establishes necessary conditions on the experimental temporal resolution 
necessary to establish the existence of true bursting. 

Gene expression is a cascade of first transcription to produce mRNA followed by trans- 
lation of that mRNA to protein. Thus it makes sense to take the system state variables 
to be the numbers of mRNA and protein molecules in a given cell. mRNA molecules are 
produced at a rate that is dependent on the interaction between the RNA polymerase and 
the promoter site. The number of mRNA molecules in the cell and their interactions with 
ribosomes controls the protein synthesis. In this framework, self regulation is introduced in 
the model by considering the transcription rate to be dependent on the number of protein 
molecules produced. It is worth noting that specific regulation of the state of the promoter 
site (active or repressed) is not taken into account on this model. 



This picture for non- regulated genes has been treated in the literature previously [18]. 

Let m and n denote, respectively, the number of mRNA and protein molecules. The 
probability of finding the system in a state (m, n), m, n > 0, at time t is denoted by P m ,n(t), 
while the synthesis rates for mRNA and protein are denoted by p° M , p} M and up, and the 
corresponding degradation rates are pu and pp. Then the evolution of the probability is 
governed by a master equation for two coupled birth-death processes: 

= (^M + ^M n )(Pm-l,n ~ Pm,n) 
+ Vpm{P m ^\ — P m ,n) 

,n Pm,n\ 

+ p P [(n + 1) -nP mtn ], (1) 

where we have assumed that the transcription rate is a function of the protein number (p^n), 



indicating positive self regulation, with the requirement that p? M ^ 0. We have assumed a 
linear dependence between the protein translation rate (z/p) and the number of available 
mRNA molecules in the cytoplasm. We have been unable to construct an analytic solution 
to the complete system of Eq. (|TJ) . However, exact quadrature is achieved in the limiting 
case when the mRNA degradation rate is much greater than the protein degradation rate 
(Pm/pp ^ 1) |s| so the mRNA lifetime is quite short relative to the protein lifetime. 

That suggests scaling the model parameters by the protein lifetime (~ Pp 1 ), which results 
in the dimensionless quantities 

o Pm i p\i Pm v P 

P = , P = , 7 = — , v= — . (2) 

Pp Pp Pp Pp 

The approximate Eq. (TjQ) becomes 

= [{n + l)P ,n+i - nP 0jTl ] 

- (p° + //n)P ,n + 7A,n, (3) 

dP 

~7^~~ = v(Pl,n-X - Pl,n) + [(" + l)Pl,n+l ~ ™A,n] 

dr 

+ + /i 1 n)P ,n - 7Pi,n, (4) 

where we have introduced the dimensionless time r = ppt scale and the approximations 
P m ,n ~ 0,m > 2 and + (j l 1 n)Pi !n /('jP2,n) ~ 1, for all n. Since our simplifying assumption 
implies that the mRNA lifetime is short relative to that of the protein, we would expect that 
mRNA probabilities will be peaked around zero for /i ,/! 1 of the same order as pu- This 
offers some justification for assuming that Eqs. ([3]) and (j4j) are valid for describing gene 
expression (Supplementary information). 

Eqs. ([3]) and (j4j) have the same form as the master equation for a binary gene with the 
state (1, n) (or (0, n)) as the active (or repressed) state of protein synthesis with rate v (or 
zero). The "on-off" switching rate is given in terms of the mRNA degradation rate 7 and 
the "off-on" transition depends on the mRNA synthesis rates, p° (for external regulation) 
or yU° + p}n (self regulation). To write the solutions of the model presented at the Eqs. ([3]) 
and OH), we define constants (a,b,9) as folows: 

6 = 1 1 ,1 ^ (5) 



1 + p 1 ' 1 + p 1 ' 1 + /i 1 ' 

where external regulation is recovered by setting 9 — 1. For simplicity we consider only the 
steady state solutions for Eqs. (j3J) and (j3J, Po, n ,Pi,n, and the probabilities for finding n 



proteins inside the cell, P n = Po >n + Pi >n , namely: 



Po,n = ^^7T%T M ( a + n,l + b + n, -u0), (6) 
Co n\ (1 + b) n 

Pi,n = A^7^rM(l + a + n, 1 + b + n, -*/0), (7) 
Co n! (1 + 6) n 

D _ Wri M(a + n,6 + n,-z/0), (8) 



n Cn\ (6) 

where M(a, 6, 2) denotes the Kummer M function [19j and the normalization constant 

C = M(a,6,z/(1 - 0)), 

assures conservation of probability X^n°=o(-^o,n + -pL,n) = 1- Note that for external regulation, 
C = M(a,6,0) = 1. 

The denominator in Eq. (jSJ), 1+fi 1 , should be interpreted as the total protein removal rate 
from the cytoplasm by degradation plus transcription stimulus. The constant a characterizes 
the rate of spontaneous (basal) mRNA synthesis relative to the protein removal rate and 
states a relation with the probability for finding one mRNA (pi), defined as Y^=oPi,m 
namely 

Pl = A M (a+ 1,6 +1,1/(1-0)), (9) 

and, for the external regulating gene, 9 = 1, it implies 

a 

Pl= b- 

Phenomenologically, 6 is a compound relation between the rate for a cycle of mRNA 

synthesis-degradation and the protein removal rate. Its role in determining the statistics of 

protein numbers is seen from the average number of protein molecules, (n) = pit/, and the 

variance relative to the average (Fano factor), a 2 /{n) = ({n 2 ) — (n) 2 )/ (n), given by 

a 2 a + lM(a + 2,6 + 2,z/(l-0)) 

(ra)~ +I/ 6+lM(a+l, 6 + 1, v{l - 6)) 

a M(a + !,&+!, 1/(1-0)) nm 
V b M(a, 6, v{\ - 9)) ' 1 j 



For the case where 9 = 1, we have 



a 2 vl — a/b 



In the limit 6 — > +00 (fast switching) and the parameters (a, 9, v) are finite, Eq. (fTUl) is 
equal to one and the distribution of protein is Poissonian. 



To get some intuition into this system, consider the steady state (or equilibrium) of the 
Eqs. ([3]) and (jl]), when probabilities are fixed with time, but the variables (m,n) change in 
time with the probabilities given by Eqs. 03), an d (JE])- b gives the average time for the 
system to complete one switching cycle - e.g. from off to on and back to off' again. Then 
the probabilities p\ and 1 — p 1 are the fractions of the total switching time that the system 
spends in the active and inactive states respectively. 



The transition from the dynamic to the stationary regime, as noted previously [13|, Il5| . 
has an approach to equilibrium characterized by two of the time scales of the model, pp 1 
and {pp + /xj^-) -1 ^" 1 . The former is the typical lifetime of the protein, whereas the second 
one is related to the switching. For rapid protein degradation, compared to the switching, 
the steady state is achieved after the equilibrium of on-off transitions that are slow and can 
result in super Poisson stationary distributions (eventually, bi-modality occurs with each 
peak related to one state of the system). On the other hand, when protein degradation 
dominates, and there is fast switching, the distributions are uni-modal. In that case, the 
gene regulatory mechanism (e.g. if binary or constitutive) is indistinguishable by simple 
protein counting. 

This reasoning suggests that bursting would occur for systems with a large value of v 
and pi ~ 0. Biologically, this would mean that the mRNA number is mostly zero during 
an entire switching cycle. For a p\ fraction of that cycle, there will be one mRNA that is 
rapidly translated (at rate v) and thus several unitary increments in n take place during 
a very short time. This will appear to be a single near-instantaneous increase in protein 
number by more than one. A mechanism for a rapid increase in n from one mRNA is the 
binding of several ribosomes to the mRNA. 

Mathematically, the negative binomial distribution is assumed to describe a random 
variable characterizing a bursting process. We can show (Supplementary information) that 
the negative binomial distribution is a particular case of the probabilities of the Eq. ([8]) at 
the limit of b, v — > oo with their ratio 

6=\, (12) 

kept finite, namely: 

" n\ \1 + Se) \ 1 + 60 J ' ( ' 

where (a) n = a(a + 1) . . . (a + n — 1), (a)o = 1. For the self-regulating case, an approximate 
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negative binomial distribution occurs for 9 ~ 1, which implies weak induction of mRNA syn- 
thesis by proteins, (/i 1 << 1). For the externally regulated gene, 6=1, and the probabilities 
at equation above become: 



that is the negative binomial distribution 



+ 5J \l + S9 / 

We illustrate our results in FIG.[l]where the left hand column is for the external regulated 
gene and the right hand column is for the self regulating case. FIG. [TJA and [TJD are 
the steady state probability distributions as obtained from the expression of the binary 
(Eq. [8]) and negative binomial (Eq. H4"|) models. For the parameter values we choose, 
inspection shows high agreement with the externally regulated while a slight difference 
appears for the self-regulating gene. The corresponding trajectories are obtained from the 
binary model, with the protein numbers shown in FIG. [TJB and [TJE. The apparent bursts 
appear explicitly at protein half-life time scale. However, an expansion of the time scale 
reveals that the protein synthesis is occurring one by one. Finally, the corresponding mRNA 
number dynamics is shown in FIG. [HC and[UF. As expected, it is switching between very 
short time intervals with one mRNA and long intervals with no mRNA. 

Experimentally, the temporal resolution necessary for avoiding anomalous bursting de- 
tection should be of the order of the average time for translation of one protein, ~ 1/Vp. In 



what follows, we shall consider the system approached in Ref. [20|] to show an example of a 
measurement of apparent bursting. In their work, the authors monitored the expression of 
the /5-gal protein under the control of the lac promoter. They have detected the occurrence 
of burstings in protein numbers and measured the average bursting size to be of ~ 8 pro- 
teins synthesized per burst. Our aim is to calculate the protein synthesis rate of the system 
reported in Ref. 20] using the average bursting size calculated by the authors. We shall 
employ our approximation at Eqs. fl3]) and (j4j) and estimate the necessary time resolution 
for avoiding the measurement of apparent bursting. 

We start by setting the average bursting size in terms of the rates of the Eq. ([[]). In 



literature js], the average bursting size is usually given by the parameter 5, of the negative 
binomial probability distribution at the Eqs. ( Fl3l) and ( f!4"l) . Therefore, the protein synthesis 
rate, u P at the Eq. ([1]), can be written as a function of 5 as follows: 

v P = bpp ■ — — , (15) 

Pp + Pm 



that is deduced by combining the Eqs. (|T2|) . ([5]), and ([2]). 

To proceed calculating up, we set p} M = - and assume external regulation - since the lac 
promoter interacts with the Lac repressor protein that is not encoded in the lac operon genes. 
Then, the expression for calculating the protein synthesis rate at the Eq. (I12p is reduced 
to Up = S(fi° M + Pm). The values of the remining unknown constants, mRNA synthesis and 
degradation, are determined in terms of experimental measures. 

The mRNA degradation rate of the /3-gal mRNA - pu - is taken to be ~ 0.1 min -1 based 



on data reported in Ref. 21 1 



We use the data provided in Ref. {20 1 to estimate the mRNA synthesis rate {p° M ) at 
10 -3 min" 1 . This number is achieved dividing the average frequence of bursting of 0.16 



per cell cycle by the average period of a cell cycle, 145 min (both data from Ref. 20(). The 
bursts of proteins occur whenever one mRNA arrives at the cytoplasm. Therefore, we can 
convert the average bursting frequence into the average mRNA synthesis rate. 

Based on the values of 5, pP M , /ij^, and pu above we compute the protein synthesis rate 
as up ~ 1 min -1 . Thus, the time scale for the synthesis of one protein is ~ 1 min or, 
\n.2/up ~ 0.5 min in case of exponential growth of the protein population. Such time 
scales are smaller than the time resolution of protein detection of 4 min reported in Ref. 



201 ] . Intuitively, one might expect an average of at least 4 protein synthesis during the 
temporal range of the experimental resolution. The detection of an increase greater than 
one in protein population should be interpreted as a bursting. However, under the conditions 
we are considering in this manuscript, an experimental time resolution of ~ 1 min would 
re-establish a one-by-one protein population increase. 

We also stress another aspect of experimental measurements: the sampling time. Usu- 
ally, this is the time interval between two gatherings of cells from the cell culture. In the 
experiments presented in Ref. [201 ]. the sampling time is 20 s. Note that it appears to be 
enough to detect the one- by-one protein increment. However, the time resolution for pro- 
tein detection is obtained indirectly, from fluorescence measurements, which results a 4 min 
precision. Therefore, this is not enough to detect individual proteins. 

It is worth to discuss the phenomenology of the burst like behavior reported in Ref. 
[20! ] . Theoretically, it relates to the value of the ratio u = up/pp that is a large number, 
>> 1. Then, whenever an mRNA appears in the cytoplasm, a plethora of proteins is fastly 
synthesized while their degradation is very slow. In a plot of the protein number versus time, 
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that condition appears as a fast increment in protein population, followed by a plateau, if 
the experiment stands shorter than the protein half-life time. 

In that description of protein numbers inside the cell in terms of the negative 

binomial distribution is appropriate. However, it must be emphasized that the fitting of 
the measured histogram by a known probability distribution does not imply the occurrence 
of the stochastic process from where the distribution is derived. The predictive power of 
a model based on such approach might be lowered. In that sense, our discussion is an 
increment in the capability of interpreting experimental data. That is done establishing the 
time resolution necessary for experimental demonstration of bursting. 

We establish two possible sceneries for the occurrence of real burst of proteins. It is 
assumed experimental time resolution of the order of ~ 1/vp and the measurement of a 
greater than one instantaneous increment in protein population. The first scenery requires 
the existence of only one mRNA in the cytoplasm. A possible mechanism to underlie this 
effect is: multiple polipeptide chains present in cytoplasm that were translated individually 
and start their functional activity simultaneously. Note that that implies a delay between 
the translation process and the protein folding. In our second bursting scenery, there is 
abundant fast degradating mRNA's in the cytoplasm that can be translated synchronously 
in an one-by-one fashion. Hence, multiple proteins could be synthesized simultaneously in 
a time scale of the order of ~ 1/vp. 

Our results also suggest a picture of gene expression where the bursting (or burst-like) 
dynamics corresponds to one among other possible behaviors. Different regimes of gene 
expression are possible depending on the specific relations among the effective rates of the 
reactions participating of a gene network. For example, the model at Eq. (OQ) has a precise 
biological interpretation. Its approximation in terms of the binary model, Eqs. (j3J) and @, 
shows the utility of the "on" and "off" model for the analysis of gene products synthesis. In 
terms of probability distributions one expect that, besides the negative binomial, the gene 
products should also satisfy the probabilities based on Eqs. (EJ), (JTj) and flHJ). 

These probabilities satisfy the Eq. (CQ) when the probability to find more than one mRNA 
in the cytoplasm is neglegible. In this sense, one might provide further approximations to 
the solution of the Eq. (JTJ) for neglegible probability of detecting three, four mRNA's in the 
cytoplasm, respectively, in terms of ternary, quaternary models. Different insights in the 
workings of gene expression could also be provided by this kind of approach. 
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In this manuscript we are proposing a theoretical framework, based on the binary model 
to gene expression, that generalizes the negative binomial distribution for the description of 
the stochasticity in gene products number. The burst like behavior occurs in a well defined 
regime, when the ratio between synthesis and degradation rates is of the order of 10 3 and the 
synthesis of gene products very rare. As we predict from our model, measurements aiming 
to detect the one-by-one increments in gene products number must have temporal resolution 
of the order of the synthesis rate (1/z/p), e.g. in conditions reported in Ref. 20j that would 
imply a time resolution of ~ 10 — 60 s. 
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